Relative phase information for detecting human speech and spoofed speech

نویسندگان

Longbiao Wang

Yohei Yoshida

Yuta Kawakami

Seiichi Nakagawa

چکیده

The detection of human and spoofed (synthetic/converted) speech has started to receive more attention. In this study, relative phase information extracted from a Fourier spectrum is used to detect human and spoofed speech. Because original/natural phase information is almost entirely lost in spoofed speech using current synthesis/conversion techniques, a modified group delay based feature, the frequency derivative of the phase spectrum, has been shown effective for detecting human speech and spoofed speech. The modified group delay based phase contains both the magnitude spectrum and phase information. Therefore, the relative phase information, which contains only phase information, is expected to achieve a better spoofing detection performance. In this study, the relative phase information is also combined with the Mel-Frequency Cepstral Coefficient (MFCC) and modified group delay. The proposed method was evaluated using the “ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge” dataset. The results show that the proposed relative phase information significantly outperforms the MFCC and modified group delay. The equal error rate (EER) was reduced from 1.74% of MFCC, 0.83% of modified group delay to 0.013% of relative phase. By combining the relative phase with MFCC and modified group delay, the EER was reduced to 0.002%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anti-spoofing system: an investigation of measures to detect synthetic and human speech

Automatic Speaker Verification (ASV) systems are prone to spoofing attacks of various kinds. In this study, we explore the effects of different features and spoofing algorithms on a state-of-the-art i-vector speaker verification system. Our study is based on the standard dataset and evaluation protocols released as part of the ASVspoof 2015 challenge. We compare how different features perform w...

متن کامل

Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech

A speaker verification system should include effective precautions against malicious spoofing attacks, and although some initial countermeasures have been recently proposed, this remains a challenging research problem. This paper investigates discrimination between spoofed and genuine speech, as a function of frequency bands, across the speech bandwidth. Findings from our investigation inform s...

متن کامل

The Effect of Comprehensible Input and Comprehensible Output on the Accuracy and Complexity of Iranian EFL Learners’ Oral Speech

This study aimed at investigating the relative impact of comprehensible input and comprehensible output on the development of grammatical accuracy and syntactic complexity of Iranian EFL learners’ oral production. Participants were 60 female EFL learners selected from a whole population pool of 80 based on the standard test of IELTS. To investigate the research questions, the participants were ...

متن کامل

Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech

Speech synthesis and voice conversion techniques can pose threats to current speaker verification (SV) systems. For this purpose, it is essential to develop front end systems that are able to distinguish human speech vs. spoofed speech (synthesized or voice converted). In this paper, for the ASVspoof 2015 challenge, we propose a detector based on combination of cochlear filter cepstral coeffici...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Relative phase information for detecting human speech and spoofed speech

نویسندگان

چکیده

منابع مشابه

Anti-spoofing system: an investigation of measures to detect synthetic and human speech

Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech

The Effect of Comprehensible Input and Comprehensible Output on the Accuracy and Complexity of Iranian EFL Learners’ Oral Speech

Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

عنوان ژورنال:

اشتراک گذاری